Two different robots

Two different robots#

The code for this example is implemented different_robots. Let us import it.

[1]:
from enki_env.examples import different_robots

Environment#

The environment contains one Thymio and one E-Puck. Otherwise it is very similar to the previous “same robots” example: same task, same reward, just different robots with (slightly in this case) different sensors.

To create the environment via script, run:

python -m enki_env.examples.different_robot.environment
[2]:
env = different_robots.make_env(render_mode="human")
env.reset()
env.snapshot()
snapshot

The robots belong to different groups with different observation spaces

[3]:
env.group_map
[3]:
{'thymio': ['thymio_0'], 'e-puck': ['e-puck_0']}
[4]:
env.observation_spaces
[4]:
{'thymio_0': Dict('wheel_speeds': Box(-1.0, 1.0, (2,), float32), 'prox/value': Box(0.0, 1.0, (7,), float32)),
 'e-puck_0': Dict('wheel_speeds': Box(-1.0, 1.0, (2,), float32), 'prox/value': Box(0.0, 1.0, (8,), float32))}

Baseline#

We adapted the Thymio baseline to work for the E-Puck

To evaluate the performances of both baselines via script, run:

python -m enki_env.examples.different_robots.baseline
[5]:
import inspect

print(inspect.getsource(different_robots.EPuckBaseline.predict))
    def predict(self,
                observation: Observation,
                state: State | None = None,
                episode_start: EpisodeStart | None = None,
                deterministic: bool = False) -> tuple[Action, State | None]:
        prox = np.atleast_2d(observation['prox/value'])
        m = np.max(prox, axis=-1)
        prox[m > 0] /= m[:, np.newaxis][m > 0]
        ws = np.array([(-0.1, -0.25, -0.5, -1, -1, 0.5, 0.25, 0.1)], dtype=np.float32)
        w = np.tensordot(prox, ws, axes=([1], [1]))
        w[m == 0] = 1
        return np.clip(w, -1, 1), None

To perform a rollout, we need to assign the policy to the whole group.

[6]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': different_robots.ThymioBaseline(),
                                                        'e-puck': different_robots.EPuckBaseline()})
[7]:
rollout.keys()
[7]:
dict_keys(['thymio', 'e-puck'])
[8]:
rollout['thymio'].episode_reward, rollout['e-puck'].episode_reward
[8]:
(np.float64(-2.969223195758214), np.float64(-13.845665476374068))

Reinforcement Learning#

Let us now train and evaluate two RL policies for this task, one for each robot.

To perform this via script, run:

python -m enki_env.examples.different_robots.rl
[9]:
policies = different_robots.get_policies()
[10]:
policies.keys()
[10]:
dict_keys(['thymio', 'e-puck'])
[11]:
rollout = env.unwrapped.rollout(max_steps=10, policies=policies)
rollout['thymio'].episode_reward, rollout['e-puck'].episode_reward
[11]:
(np.float64(-3.5418375520177077), np.float64(-15.970891166964257))

Video#

To conclude, to generate a similar video as before, you can run

python -m enki_env.examples.different_robots.video

or run

[12]:
video = different_robots.make_video()
video.display_in_notebook(fps=30, width=640, rd_kwargs=dict(logger=None))
[12]:
[ ]: